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Description 

The present invention relates to microprocessor ar- 
chitectures and, in particular, to a microprocessor that 
partially decodes instructions retrieved from external 
memory before storing them in an internal instruction 
cache. Partially decoded instructions are retrieved from 
the internal cache for either parallel or sequential exe- 
cution by multiple, parallel, pipelined functional units. 

In recent years, there has been a trend in the design 
of microprocessor architectures from Complex Instruc- 
tion Set Computers (CISC) toward Reduced Instruction 
Set Computers (RISC) to achieve high performance 
while maintaining simplicity of design. 

In a CISC architecture, each macroinstruction re- 
ceived by the processor must be decoded internally into 
a series of microinstruction subroutines. These microin- 
struction subroutines are then executed by the micro- 
processor. 

In a RISC architecture, the number of microinstruc- 
tions which the processor can understand and execute 
is greatly reduced. Further, those macroinstructions 
which the processor can understand and execute are 
very basic so that the processor either does not decode 
them into any microinstructions (the macroinstruction is 
executed in its macro form) or the decoded microinstruc- 
tion subroutine involves very few microinstructions. 

The transition from CISC architectures to RISC ar- 
chitectures has been driven by two fundamental devel- 
opments in computer design that are now being exten- 
sively applied to microprocessors. These developments 
are integrated cache memory and optimizing compilers. 

A cache memory is a small, high speed buffer locat- 
ed between the processor and main memory to hold the 
instructions and data most recently used by the proces- 
sor. Experience shows that computers very commonly 
exhibit strong characteristics of locality in their memory 
references. That is, references tend to occur frequently 
either to locations that have recently been referred to 
(temporal locality) or to locations that are near others 
that have recently been referred to (spatial locality). As 
a consequence of this locality, a cache memory that is 
much smaller than main memory can capture the large 
majority of a program's memory references. Because 
the cache memory is relatively small, it can be realized 
from a faster memory technology than would be eco- 
nomical for the much larger main memory. 

Before the development of cache memory tech- 
niques for use in mainframe computers, there was a 
large imbalance between the cycle time of a processor 
and that of memory. This imbalance was a result of the 
processor being realized from relatively high speed bi- 
polar semiconductor technology and the memory being 
realized from much slower magnetic-core technology. 
The inherent speed difference between logic and mem- 
ory spurred the development of complex instruction sets 
that would permit the fetching of a single instruction from 
memory to control the operation of the processor for 



several clock cycles. The imbalance between processor 
and memory speeds was also characteristic of the early 
generations of 32-bit microprocessors. Those micro- 
processors would commonly take 4 or 5 clock cycles for 

s each memory access. 

Without the introduction of integrated cache mem- 
ory, it is unlikely that RISC architectures would have be- 
come competitive with CISC architectures. Because a 
RISC processor executes more instructions than does 

10 a CISC processor to accomplish the same task, a RISC 
processor can deliver performance equivalent to that of 
a CISC only if a faster and more expensive memory sys- 
tem is employed. Integrated cache memory enables a 
RISC processor to fetch an instruction in the same time 

15 required to execute the instruction by an efficient proc- 
essor pipeline. 

The second development that has led to the effec- 
tiveness of RISC architectures is optimizing compilers. 
A compiler, which may be implemented in either hard- 

20 ware or software, translates a computer program from 
the high-level language used by the programmer into 
the machine language understood by the computer. 

For many years after the introduction of high-level 
languages, computers were still extensively pro- 

25 grammed in assembly language. Assembly language is 
a low-level source code language employing crude 
mnemonics that are more easily remembered by the 
programmer than object-code or binary equivalents. 
The advantages of improved software productivity and 

30 translatability of high-level language programming were 
clear, but simple compilers produced inefficient code. 
Early generations of 32-bit microprocessors were devel- 
oped with consideration for assembly language pro- 
gramming and simple compilers. 

35 More recently, advances in compiler technology are 
being applied to microprocessors. Optimizing compilers 
can analyze a program to allocate large numbers of reg- 
isters efficiently and to manage processor pipeline re- 
sources. As a consequence, high-level language pro- 

40 grams can execute with performance comparable to or 
exceeding that of assembly programs. 

Many of the leading pioneers in RISC developments 
have been compiler specialists who have demonstrated 
that optimizing compilers can produce highly efficient 

45 code for simple, regular architectures. 

Highly integrated single-chip microprocessors em- 
ploy both pipelined and parallel execution to improve 
performance. Pipelined execution means that while the 
microprocessor is fetching one instruction, it can be st- 

so multaneously decoding a second instruction, reading 
source operands for a third instruction, calculating re- 
sults for a fourth instruction and writing results for a fifth 
instruction. Parallel execution means that the micro- 
processor can initiate the operands for two or more in- 

55 dependent instructions simultaneously in separate func- 
tional units. 

As stated above, one of the main challenges in de- 
signing a high-performance microprocessor with multi- 
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pie, pipelined functional units is to provide sufficient in- 
struction memory on-chip and to access the instruction 
memory efficiently to control the functional units. 

The requirement for efficient control of a microproc- 
essor's functional units dictates a regular instruction for- 
mat that is simple to decode. However, in conventional 
microprocessor architectures, instructions in main 
memory are highly encoded and of variable length to 
make efficient use of space in main memory and the lim- 
ited bandwidth available between the microprocessor 
and the main memory. 

The present invention is defined by the independent 
claims 1 and 8 and provides a processor and corre- 
sponding method that resolves the conflicting require- 
ments for efficient use of main memory storage space 
and efficient control of the functional units by partially 
decoding instructions retrieved from main memory be- 
fore placing them into the microprocessor's integrated 
instruction cache. Preferably, each entry in the instruc- 
tion cache has two slots for partially decoded instruc- 
tions. One slot controls one of the microprocessor's ex- 
ecution pipelines and a port to its data cache. The sec- 
ond slot controls a second execution pipeline, or one of 
the microprocessor's floating point units, or a control 
transfer instruction. An instruction decoding unit, or 
loader, decodes instructions from their compact format 
as stored in main memory and places them into the two 
slots of the instruction cache entry according to their 
functions. Auxiliary information may also be placed in 
the cache entry along with the instruction to control par- 
allel execution and emulation of complex instructions. A 
bit in each cache entry may indicate whetherthe instruc- 
tions in the two slots for that entry are independent, so 
that they can be executed in parallel, or dependent, so 
that they must be executed sequentially. Using a single 
bit for this purpose allows two dependent instructions to 
be stored in the slots of a single cache entry. Otherwise, 
the two instructions would have to be stored in separate 
entries and only one-half of the cache memory would 
be utilized in those two entries. 

Some features of the independent claims are known 
per se. 

US-A-4,873,629 discloses a computer configured 
for optimizing the processing rate of instructions and a 
corresponding method. The computer includes a main 
memory, a cache unit, and a central processing unit. Ac- 
cording to this document (encoded) instructions re- 
trieved from the main memory are "cracked", i.e. the ad- 
dress fields of the instructions are decoded such that 
they can be stored in a logical instruction cache unit. 
When the cracked instructions are retrieved from the 
cache unit for subsequent execution, they are (sequen- 
tially) sent to an output buffer and decoder means where 
a decoding step occurs and furthermore decoded pro- 
gram count and displacement information is generated. 
Decoded instructions then are sent to ALUs for execu- 
tion thereof. 

The technique according to this document therefore 
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requires time-consuming multiple storage of the instruc- 
tions, and especially teaches to decouple the cache 
means and the functional units by intermediate buffer 
means. 

5 Anothertechnique is disclosed by EP-A-O 363 222. 
This document relates to apparatus and method for con- 
current dispatch of instruction words. These instructions 
are capable to be separately and substantially simulta- 
neously received by distinct functional units, i.e. a float- 

J0 mg point unit and an integer unit being part of a proces- 
sor. Instructions coming from an external source are re- 
trieved and stored in an instruction cache, wherein the 
cache is partitioned into even and odd cache sections, 
which each are subsequently connected via logic 

'5 means to functional units. The logic units include decod- 
ing means that are capable to decode the encoded in- 
structions after reception. Accordingly, the instructions 
cached in the instruction cache are still encoded. 

The known architecture is suitable for Complex In- 

20 struction Set Computers (CISC) with a central process- 
ing unit, but will encounter problems when being trans- 
ferred to a Reduced Instruction Set Computer (RISC) 
architecture. A problem arising when processing in- 
structions in a RISC architecture is that the number of 

25 instructions understood by the processor is greatly re- 
duced, at the same time, these instructions are per- 
formed very quickly. The operation of completely decod- 
ing a retrieved encoded instruction bears an unpredict- 
able doubt concerning required time when performing 

30 the decoding operation so that processor time is difficult 
to optimize. 

An article by Stevens, G.B. et al, "HARP": A parallel 
pipelined RISC processor", Microprocessor and Micro- 
systems Vol. 13, No. 9, November 1989, pp 579-587, 

35 London, GB, relates to a compiler that packs independ- 
ent "HARP" instructions being executable in parallel into 
long instruction words. The long instruction words are 
retrieved from an instruction cache for passing the com- 
ponent short instructions of the long instruction word 

40 through a parallel pipeline structure. 

There is no teaching that could give a h int to partially 
decode encoded instructions before storing them in a 
cache. 

WO-A-90 03 001 discloses a CISC system in which 
45 encoded instructions are retrieved from the main mem- 
ory, subsequently stored in a cache means. The instruc- 
tions, when retrieved from the cache, are partially de- 
coded and stored into an FIFO instruction buffer before 
execution. 

50 Such Fl FO queue inter alia provides gradual decou- 
pling of cache and processor unit for buffering time ru- 
nouts due to partially decoding. It is evident that this so- 
lution provides more complicated architecture, and of 
course does not teach the invention. 

55 A better understanding of the features and advan- 
tages of the present invention will be obtained by refer- 
ence to the following detailed description of the inven- 
tion and accompanying drawings which set forth an il- 
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tustrative embodiment in which the principles of the in- 
vention are utilized. 

Figure 1 is a block diagram illustrating a microproc- 
essor architecture that incorporates the concepts of the 
present invention. 

Figure 2 is a block diagram illustrating the structure 
of a partially decoded instruction cache utilized in the 
figure 1 architecture. 

Figure 3 is a simplified representation of a partially 
decoded entry stored in the instruction cache shown in 
figure 2. 

Figure 4 is a block diagram illustrating the structure 
of the integer pipelines utilized in the microprocessor ar- 
chitecture shown in figure 1. 

Figure 1 shows a block diagram of a microprocessor 
10 that includes multiple, pipelined functional units that 
are capable of executing two instructions in parallel. 

The microprocessor 10 includes three main sec- 
tions: an instruction processor 1 2, an execution proces- 
sor 14 and a bus interface processor 16. 

The instruction processor 12 includes three mod- 
ules: an instruction loader 18, an instruction emulator 
20 and an instruction cache 22. These modules load in- 
structions from the external system through the bus in- 
terface processor 16, store the instructions in the in- 
struction cache 22 and provide pairs of instructions to 
the execution processor 14 for execution. 

The execution processor 14 includes two 4-stage 
pipelined integer execution units 24 and 26, a double- 
precision 5-stage pipelined floating point execution unit 
28, and a 1 024 byte data cache 30. A set of integer reg- 
isters 32 services the two integer units 24 and 26; sim- 
ilarly, a set of floating point registers 34 services the 
floating point execution unit 28. 

The bus interface processor 16 includes a bus in- 
terface unit 36 and a number of system modules 38. The 
bus interface unit 36 controls the bus accesses request- 
ed by both the instruction processor 1 2 and the execu- 
tion processor 14. In the illustrated embodiment, the 
system modules 38 include a timer 40, a direct memory 
access (DMA) controller 42, an interrupt control unit 
(ICU) 44 and I/O buffers 46. 

As described in greater detail below, the instruction 
loader 18 partially decodes instructions retrieved from 
main memory and places the partially decoded instruc- 
tions in the instruction cache 22. That is, the instruction 
loader 1 8 translates an instruction stored in main mem- 
ory (not shown) into the decoded format of the instruc- 
tion cache 22. As will also be described in greater detail 
below, the instruction loader 18 is also responsible for 
checking whether any dependencies exist between con- 
secutive instructions that are paired in a single instruc- 
tion cache entry. 

The instruction cache 22 contains 512 entries for 
partially-decoded instructions. 

In accordance with one aspect of the present inven- 
tion, and as explained in greater detail below, each entry 
in the instruction cache 22 contains either one or two 



6 

instructions stored in a partially-decoded format for ef- 
ficient control of the various functional units of the mi- 
croprocessor 10. 

In accordance with another aspect of the present 

5 invention, each entry in instruction cache 22 also con- 
tains auxiliary information that indicates whether the two 
instructions stored in that entry are independent, so that 
they can be executed in parallel, or dependent, so that 
they must be executed sequentially. 

10 The instruction emulator 20 executes special in- 
structions defined in the instruction set of the microproc- 
essor 10. When the instruction loader 18 encounters 
such an instruction, it transfers control to the emulator 
20. The emulator is responsible for generating a se- 

15 quence of core instructions (defined below) that perform 
the function of a single complex instruction (defined be- 
low). In this regard, the emulator 20 provides ROM-res- 
ident microcode. The emulator 20 also controls excep- 
tion processing and self-test operations. 

20 The two 4-stage integer pipelines 24 and 26 per- 
form basic arithmetic/logical operations and data mem- 
ory references. Each integer pipeline 24,26 can execute 
instructions at a throughput of one per system clock cy- 
cle. 

25 The floating point execution unit 28 includes three 
sub-units that perform single-precision and double-pre- 
cision operations. An FPU adder sub-unit 28a is respon- 
sible for add and convert operations, a second sub-unit 
28b is responsible for multiply operations and a third 

30 sub-unit 28c is responsible for divide operations. 

When add and multiply operations are alternately 
executed, the floating point execution unit 28 can exe- 
cute instructions at a throughput of one instruction per 
system clock cycle. 

35 Memory references for the floating point execution 
unit 28 are controlled by one of the integer pipelines 
24,26 and can be performed in parallel to floating-point 
operations. 

Data memory references are performed using the 

40 1 -Kbyte data cache 30. The data cache 30 provides fast 
on-chip access to frequently used data. In the event that 
data are not located in the data cache 30, then off-chip 
references are performed by the bus interface unit (BIU) 
36 using the pipelined system bus 48. 

45 The data cache 30 employs a load scheduling tech- 
nique so that it does not necessarily stall on misses. This 
means that the two execution pipelines 24,26 can con- 
tinue processing instructions and initiating additional 
memory references while data is being read from main 

50 memory. 

The bus interface unit 36 can receive requests for 
main memory accesses from either the instruction proc- 
essor 1 2 or the execution processor 1 4. These requests 
are sent to the external pipelined bus 48. The external 

55 bus can be programmed to operate at half the frequency 
of the microprocessor 10; this allows for a simple in- 
struction interface at a relatively low frequency while the 
microprocessor 10 executes a pair of instructions at full 
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rate. 

The instruction set of the microprocessor 10 is par- 
titioned into a core part and a non-core part. The core 
part of the instruction set consists of performance critical 
instructions and addressing modes, together with some 
special-function instructions for essential system oper- 
ations. The non-core part consists of the remainder of 
the instruction set. Performance critical instructions and 
addressing modes were selected based on an analysis 
and evaluation of the operating system (UNIX in this 
case) workload and various engineering, scientific and 
embedded controller applications. These instructions 
are executed directly as part of the RISC architecture of 
microprocessor 10. 

As stated above, special-function and non-core in- 
structions are emulated in microprocessor 10 by mac- 
roinstruction subroutines using sequences of core in- 
structions. That is, instructions that are a part of the 
overall instruction set of the microprocessor 10 architec- 
ture, but that lie outside the directly-implemented RISC 
core, are executed under control of the instruction em- 
ulator 20. When the instruction loader 18 encounters a 
non-core instruction, it either translates it into a pair of 
core instructions (for simple instructions like MOVB 1 
(R0),0(R1)) or transfers control to the instruction emu- 
lator 20. The instruction emulator 20 is responsible for 
generating a sequence of core instructions that perform 
the function of the single, complex instruction. 

Fig. 2 shows the structure of the instruction cache 
22. The instruction cache 22 utilizes a 2-way, set-asso- 
ciative organization with 512 entries for partially decod- 
ed instructions. This means that for each memory ad- 
dress there are two entries in the instruction cache 22 
where the instruction located at that address can be 
placed. The two entries are called a "set". 

As shown in Fig. 3, each instruction cache entry in- 
cludes two slots, i.e. Slot A and Slot B. Thus, each entry 
can contain one or two partially-decoded instructions 
that are represented with fixed fields for opcode (Opc), 
source and destination register numbers (R1 and R2, 
respectively), and immediate values (32b IMM). The en- 
try also includes auxiliary information used to control the 
sequence of instruction execution, including a bit P that 
indicates whether the entry contains two consecutive in- 
structions that can be executed in parallel and a bit G 
that indicates whether the entry is for a complex instruc- 
tion that is emulated, and additional information repre- 
senting the length of the instruction(s) in a form that al- 
lows fast calculation of the next instruction's address. 

Referring back to Fig. 2, associated with each entry 
in the instruction cache 22 is a 26-bit tag, TAG0 and 
TAG1, respectively, that holds the 22 most-significant 
bits, 3 least-significant bits and a User/Supervisor bit of 
the virtual address of the instruction stored in the entry. 
In the event that two consecutive instructions are paired 
in an entry, the tag corresponds to the instruction at the 
lower address. Associated with the tag are 2 bits that 
indicate whether the entry is valid and whether it is 



locked. For each set there is an additional single bit that 
indicates the entry within the set that is next to be re- 
placed in a Least-Recently-Used (LRU) order. 

The instruction cache 22 is enabled for an instruc- 

s tion fetch if a corresponding bit of the configuration reg- 
ister of microprocessor 10 which is used to enable or 
disable various operating modes of the microprocessor 
10, is 1 and either address translation is disabled or the 
Cl-bit is 0 in the level-2 Page Table Entry (PTE) used to 

10 translate the virtual address of the instruction. 

If the instruction cache 22 is disabled, then the in- 
struction fetch bypasses the instruction cache 22 and 
the contents of the instruction cache 22 are unaffected. 
The instruction is read directly from main memory, par- 
's tially decoded by the instruction loader 18 to form an 
entry (which may contain two partially decoded instruc- 
tions), and transferred to the integer pipelines 24, 26 via 
the IL BYPASS line for execution. 

As shown in Fig. 2, if the instruction cache 22 is en- 

20 abled for an instruction fetch, then eight bits, i.e. bits PC 
(10:3), of the instruction's address provided by the pro- 
gram counter (PC) are decoded to select the set of en- 
tries where the instruction may be stored. The selected 
set of four entries is read and the associated tags are 

25 compared with the 23 most-significant bits, i.e. PC(31: 
10), and 2 least-significant bits PC(1:0) of the instruc- 
tion's virtual address. If one of the tags matches and the 
matching entry is valid, then the entry is selected for 
transfer to the integer pipelines 24,26 for execution. Oth- 

30 erwise, the missing instruction is read directly from main 
memory and partially decoded, as explained below. 

If the referenced instruction is missing from the in- 
struction cache 22 and the contents of the selected set 
are all locked, then the handling of the reference is iden- 

35 tical to that described above for the case when the in- 
struction cache 22 is disabled. 

If the referenced instruction is missing from the in- 
struction cache 22 and at least one of the entries in the 
selected set is not locked, then the following actions are 

40 taken. One of the entries is selected for replacement ac- 
cording to the least recently used (LRU) replacement 
algorithm and then the LRU pointer is updated. If the 
entry selected for replacement is locked, then the han- 
dling of the reference is identical to that described above 

45 for the case when the instruction cache 22 is disabled. 
Otherwise, the missing instruction is read directly from 
external memory and then partially decoded by instruc- 
tion loader 1 8 to form an entry (that may contain two 
partially decoded instructions) which is transferred to 

50 the integer pipelines 24,26 for execution. If CIIN is not 
active during the bus cycles to read the missing instruc- 
tion, then the partially decoded instruction is also written 
into the instruction cache entry selected for replace- 
ment, the associated valid bit is set, and the entry is 

55 locked if Lock-Instruction-Cache bit CFG.LIC in the con- 
figuration register is 1. 

After the microprocessor 10 has completed fetching 
a missing instruction from external main memory, it will 



5 



9 

continue prefetching sequential instructions. For subse- 
quent sequential instruction fetches, the microproces- 
sor 10 searches the instruction cache 22 to determine 
whether the instruction is located on-chip. If the search 
is successful or a non-sequential instruction fetch oc- 
curs, then the microprocessor 10 ceases prefetching. 
Otherwise, the prefetched instructions are rapidly avail- 
able for decoding and executing. The microprocessor 
10 initiates prefetches only during bus cycles that would 
otherwise be idle because no off-chip data references 
are required. 

It is possible to fetch an instruction and lock it into 
the instruction cache 22 without having to execute the 
instruction. This can be accomplished by enabling a De- 
bug Trap (DBG) for a Program Counter value that 
matches two instruction's address. Debug Trap is a 
service routine that performs actions appropriate to this 
type of exception. At the conclusion of the DBG routine, 
the REturn to Execution (RETX) instruction is executed 
to resume executing instructions at the point where the 
exception was recognized. The instruction will be 
fetched and placed into the Instruction Cache 32 before 
the trap is processed. 

When the instruction which is locked in the instruc- 
tion cache 22 gets to execution and a Debug Trap on 
that instruction is enabled, instead of executing the in- 
struction, the processor will jump to the Debug Trap 
service routine. The service routine may set a break- 
point for the next instruction so that when the processor 
returns from the service routine, it will not execute the 
next instruction but rather will go again to the Debug 
Trap routine. 

The process described above, which usually gets 
executed during system bootstrap, allows the user to 
store routines in the instruction cache 22, lock them and 
have them ready for operation without executing them 
during the locking process. 

Further information relating to the architecture of 
microprocessor 10 and its cache locking capabilities is 
provided in EP-A-0 459 233. 

The contents of the instruction cache 22 can be in- 
validated by software or by hardware. 

The instruction cache 22 is invalidated by software 
as follows: The entire instruction cache contents, includ- 
ing locked entries, are invalidated while bit CFG.IC of 
the Configuration Register is 0. The LRU replacement 
information is also initialized to 0 while bit CFG.IC is 0. 
Cache Invalidate CINV instruction can be executed to 
invalidate the entire instruction cache contents. Execut- 
ing CINV invalidates either the entire cache or only un- 
locked lines according the instruction's L-option. 

The entire instruction cache 22 is invalidated in 
hardware by activating an INVIC input signal. 

Fig. 3 shows a simplified view of a partially decoded 
entry stored in the instruction cache 22. As shown in Fig. 
3, each entry has two slots for instructions. Slot A con- 
trols integer pipeline 24 and the port to data cache 30. 
Slot B controls the second integer pipe 26, or one of the 
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floating point units or a control transfer instruction. Slot 
B can also control the port to data cache 30, but only if 
slot A is not using the data cache 30. As stated above, 
instruction loader 18 retrieves encoded instructions 

s from their compact format in main memory and places 
them into slots A and B according to their functions. 

Thus, in accordance with the present invention, the 
novel aspects of instruction cache 22 include (1 ) partial- 
ly decoding instructions for storage in cache memory, 

io (2) placing of instructions into two cache slots according 
to their function and (3) placing auxiliary information in 
the cache entries along with the instructions to control 
parallel execution and emulation of complex instruc- 
tions. 

is As further shown in Fig. 3, a bit P in each instruction 
cache entry indicates whether the instructions in slots A 
and B are independent, so they can be executed in par- 
allel, or dependent, so they must be executed sequen- 
tially. 

An example of independent instructions that can be 
executed in parallel is: 

Load 4(R0),R1 ; Added 4.R0 
An example of dependent instructions requiring se- 
quential execution is: 

AdddRO, R1 ; AdddR1,R2 
Using a single bit for this purpose allows two de- 
pendent instructions to be stored in the slots of a single 
cache entry, otherwise, the two instructions would have 
to be stored in separate entries and only 1/2 of the in- 
struction cache 22 would be utilized in those two entries. 

Fig. 3 also shows a bit G in each instruction cache 
entry that indicates whether the instructions in slots A 
and B are emulating a single, more complex instruction 
from main memory. For example, the loader translates 
the single instruction ADDDO(RO), R1 into the following 
pair of instructions in slots A and B and sets the sequen- 
tial and emulation flags in the entry: 
Load 0(R0), Temp 
ADDD Temp, R1 
In accordance with the pipelined organization of the 
microprocessor 10, every instruction executed by the 
microprocessor 10 goes through a series of stages. The 
two integer pipelines 24, 26 (Fig. 1 ) are able to work in 
parallel on instructions pairs. Integer unit 24 and integer 
unit 26 are not identical, the instructions that can be ex- 
ecuted in integer unit 24 being a sub-set of those that 
can be executed in integer unit 26. 

As stated above, instruction fetching is performed 
by the instruction loader 18 which stores decoded in- 
structions in the instruction cache 22. The integer dual- 
pipe receives decoded instruction-pairs for execution. 

Referring again to Fig. 3, as stated above, an in- 
struction pair consists of two slots: Slot A and Slot B. 
The instruction in Slot A is scheduled for integer unit 24; 
the instruction in Slot B is scheduled for integer unit 26. 
Two instructions belonging to the same pair advance at 
the same time from one stage of the integer pipeline to 
the next, except in the case when the instruction in Slot 
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B is delayed in the instruction decode stage of the pipe- 
line as described below. In this case, the instruction in 
integer pipeline 24 can advance to the following pipeline 
stages. However, new instructions cannot enter the 
pipeline until the instruction decode stage is free in both s 
pipeline unit 24 and pipeline unit 26. 

Although the unit 24 and unit 26 instructions are ex- 
ecuted in parallel (except in the case of the stall ID-B 
instruction), the Slot A instruction always logically pre- 
cedes the corresponding Slot B instruction and, if the 
Slot A instruction cannot be completed due to an excep- 
tion, then the corresponding Slot B instruction is discard- 
ed. 

Referring to Fig. 4, each of the integer pipeline units 
24, 26 includes four stages: an instruction decode stage 
(ID), an execute stage (EX), a memory access stage 
(ME) and a store result stage (ST). 

An instruction is fed into the ID stage of the integer 
unit for which it is scheduled where its decoding is com- 
pleted and register source operands are read. In the EX 
stage, the arithmetic/logical unit of the microprocessor 
10 is activated to compute the instruction's results or to 
compute the effective memory address for Load/Store 
instructions. In the ME stage, the data cache 30 (Fig. 1 ) 
is accessed by Load/Store instructions and exception 
conditions are checked. In the ST stage, results are writ- 
ten to the register file, or to the data cache 30 in the case 
of a Store instruction, and Program Status Register 
(PSR) flags are updated. At this stage, the instruction 
can no longer be undone. 

As further shown in Fig. 4, results from the EX stage 
and the ME stage can be fed back to the ID stage, thus 
enabling instruction latency of 1 or 2 cycles. 

In the absence of any delays, the dual execution 
pipeline of microprocessor 10 accepts a new instruction 
pair every clock cycle (i.e., peak throughput of two in- 
structions per cycle) and scrolls all other instructions 
down one stage along the pipeline. The dual pipeline 
includes a global stalling mechanism by which any func- 
tional unit can stall the pipeline if it detects a hazard. 
Each stalls the corresponding stage and all stages pre- 
ceding it for one more cycle. When a stage stalls, it 
keeps the instruction currently residing in it for another 
cycle and then restarts all stage activities exactly as in 
the non-stalled case. 

The pipeline unit on which each instruction is to be 
executed is determined at run time by the instruction 
loader 1 8 when instructions are fetched from main mem- 
ory. 

The instruction loader 18 decodes prefetched in- 
structions, tries to pack them into instruction pair entries 
and presents them to the dual-pipeline. If the instruction 
cache 22 is enabled (as discussed above), cacheable 
instructions can be stored in the instruction cache 22. In 
this case, an entry containing an instruction pair or a 
single instruction is also sent to the instruction cache 22 
and stored there as a single cache entry. On instruction 
cache hits, stored instruction pairs are retrieved from the 
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instruction cache 22 and presented to the dual-pipeline 
for execution. 

The instruction loader 18 attempts to pack instruc- 
tions into pairs whenever possible. The packing of two 
instructions into one entry is possible only if the first in- 
struction can be executed by integer pipeline unit 24 and 
both instructions are less than a preselected maximum 
length. If it is impossible to pack two instructions into a 
pair, then a single instruction is placed in Slot B. 

Two instructions can be paired only when all of the 
following conditions hold: (1) both instructions are per- 
formance-critical core instructions, (2) the first instruc- 
tion is executable by integer pipeline unit 24, and (3) the 
displacement and immediate fields in both instructions 
use short-encoding (short encoding for all instructions 
except the Branch instruction is 11 bits and 17 bits for 
the Conditional Branch and Branch and Link instruc- 
tions). 

Several instructions of the microprocessor 10 in- 
struction set are restricted to run on integer pipeline unit 
26 only. For example, because instruction pairs in the 
instruction cache 22 are tagged by the Slot A address, 
it is not useful to put a Branch instruction in Slot A since 
the corresponding Slot B instruction will not be accessi- 
ble. Similarly, since there is a single arithmetic floating 
point pipe, it is not possible to execute two arithmetic 
floating point instructions in parallel. Restricting these 
instructions to integer pipeline unit 26 makes it possible 
to considerably simplify the dual-pipe data path design 
without hurting performance. 

Integer unit 26 can execute any instructions in the 
microprocessor 10 instruction set. 

The instruction loader 1 8 initiates instruction pairing 
upon an instruction cache miss, in which case it begins 
prefetching instructions into an instruction queue. In par- 
allel, the instruction loader 1 8 examines the next instruc- 
tion not yet removed from the instruction queue and at- 
tempts to pack it according to the following algorithm: 

Step 1 : Try to fit the next instruction into Slot A. 

(a) if the next instruction is not performance critical, 
then go to Step 5. 

(b) remove the next instruction from the instruction 
queue and tentatively place it in Slot A. 

(c) if the instruction is illegal for Slot A or if the in- 
struction has an immediate/displacement field that 
cannot be represented in 1 1 bits, or if the instruction 
is not quad-word aligned, then go to Step 4. 

(d) otherwise, continue to Step 2. 

Step 2 : Try to fit the next instruction into Slot B. 

(a) if the next instruction is not performance-critical, 
or the next instruction has an encoded immediate/ 
displacement field longer than 11 bits, or the next 
instruction is a branch with displacement longer 
than 17 bits, then go to Step 4. 

(b) otherwise, remove the next instruction from the 
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instruction queue, place it in Slot Band go to Step 3. 

Step 3 : Construct an instruction pair entry. 

In this case, both Slot A and Slot B contain valid 
instructions and all pairing conditions are satisfied. Is- 
sue a pair entry and go to Step 1 . 

Step 4 : Construct a single instruction entry. 

In this case, Slot A contains an instruction which 
cannot be paired. Move this instruction to Slot B. If this 
instruction contains an immediate/displacement field 
longer than 17 bits, or it is a branch with displacement 
longer than 17 bits, and is not quad-word aligned, then 
replace it with UN Defined. Issue the entry and goto Step 



1. 



Step 5 : Handle non -performance -critical instruc- '5 



wherein 



the internal cache memory storage means (22) 
is conceived for storing partially decoded in- 
structions, 

the encoded instructions retrieved from the 
main memory are partially decoded by said de- 
coding means (18,20) before being stored in 
said internal cache memory storage means 
(22), and 

said partially decoded instructions retrieved 



20 



25 



tions. 

Remove the next instruction from the instruction 
queue and send it to the instruction emulator 20. When 
finished with this instruction, go to Step 1 . 

The just-described pairing algorithm packs two in- 
structions whenever they can be held in a single instruc- 
tion cache entry. However, these instructions may hap- 
pen to be dependent, in which case they cannot be ex- 
ecuted in parallel. The dependencies are detected by 
the execution processor 14. 



Claims 



1 . A processor that executes instructions, comprising: 30 



a processor unit (1 4) with a plurality of function- 
al units (24, 26, 28) for execution of instructions 
in parallel, 

first retrieving means (18) for retrieving an en- 
coded instruction from an external main mem- 
ory; 

decoding means (18,20) for decoding encoded 
instructions: 

internal cache memory storage means (22) 
comprising a plurality of cache memory storage 
locations for storing instructions; and 
second retrieving means for simultaneously re- 
trieving a plurality of instructions from selected 
cache memory storage locations for execution 
by said functional units (24, 26, 28), 



from said internal cache memory storage 
means (22) are sent directly to the functional 
units (24, 26, 28) by said second retrieving 
means. 

5 

2. The processor according to claim 1 , wherein each 
of said cache memory storage loctions comprises 
a plurality of storage slots, each of said storage slots 
comprising means for storing a partially decoded in- 
fo struction. 



The processor according to claim 2, wherein said 
second retrieving means is capable of simultane- 
ously retrieving a plurality of partially decoded in- 
structions from said storage slots of a selected 
cache memory storage location for parallel execu- 
tion by said plurality of functional units (24, 26, 28). 

The processor according to one of claims 1 to 3, 
wherein each of said cache memory storage loca- 
tions include means for storing auxiliary information 
indicative of whether the plurality of instructions 
stored in said slots of a cache memory storage lo- 
cation are independent such that the instructions 
may be executed in parallel or dependent such that 
the instructions must be executed sequentially. 

The processor according to one of claims 1 to 4, 
wherein said external main memory is connected to 
the processor by means of a system bus, said sys- 
tem bus being connected to a bus interface unit (36) 
for retrieving encoded core instructions and encod- 
ed non-core instructions from said external main 
memory. 

The processor according to one of claims 1 to 5, 
wherein the decoding means (18,20) comprises an 
instruction loader (1 8) for translating a first encoded 
core instruction to a first partially-decoded instruc- 
tion and a second encoded core instruction to a sec- 
ond partially-decoded instruction further including 
means responsive to a received non-core instruc- 
tion, wherein 



35 



40 



45 



said internal cache memory storage means 
(22) comprises said plurality of cache memory 
storage locations, each cache memory storage 
location comprising a plurality of storage slots, 
each of the storage slots comprising means for 
so storing a decoded instruction; and 

means for simultaneously retrieving a plurality 
of decoded instructions from the storage slots 
of a selected cache memory storage location 
for parallel or sequential execution by the plu- 
ss rality of functional units, and wherein 

each of the cache memory storage locations in- 
cludes means for storing auxiliary information 
indicative of whether the plurality of instructions 
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stored in the slots of a cache memory storage 
location are independent such that the instruc- 
tions may be executed in parallel or dependent 
such that the instructions must be executed se- 
quentially. 

7. The processor according to one of claims 1 to 6, 
wherein the internal cache memory storage means 
(22) comprises a two-way set-associative organiza- 
tion. 

8. A method of executing instructions in a processor, 
said processor comprising: 

a processor unit (14) with a plurality of function- 
al units (24, 26, 28) for execution of instructions 
in parallel; 

first retrieving means (18) for retrieving an en- 
coded instruction from the external main mem- 
ory; 

decoding means (18,20) for decoding encoded 
instructions; 

internal cache memory storage means (22) 
comprising a plurality of cache memory storage 
locations for storing instructions; and 
second retrieving means for simultaneously re- 
trieving a plurality of instructions from selected 
cache memory storage locations for execution 
by said functional units (24, 26, 28), 

said method comprising the following steps: 

(a) retrieving encoded instructions from said 
external main memory; 

(b) partially decoding the instructions retrieved 
in retrieving step (a); 

(c) storing the instructions partially decoded in 
decoding step (b) in said internal cache mem- 
ory storage means (22); and 

(d) retrieving the partially decoded instructions 
stored in storing step (c) for subsequent execu- 
tion by said plurality of functional units, 

wherein the steps (a) through (c) are per- 
formed before step (d) is performed, whereby par- 
tially decoded instructions are directly delivered to 
the functional units (24, 26, 28) thus enhancing the 
processing speed. 

9. The method of claim 8, including the step of storing 
auxiliary information in the cache memory storage 
locations, the auxiliary information being indicative 
of whether the plurality of instructions stored in the 
slots of a cache memory storage location are inde- 
pendent such that the instructions may be executed 
in parallel, or dependent such that the instructions 
must be executed sequentially. 
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10. The method of claim 8 or claim 9, wherein said in- 
ternal cache memory storage means comprises 
said plurality of cache memory storage locations, 
each cache memory storage location comprising a 
plurality of storage slots, each of the storage slots 
. comprising means for storing a decoded instruction; 
and 

simultaneously retrieving a plurality of decod- 
ed instructions from the storage slots of a selected 
cache memory storage location for parallel or se- 
quential execution by the plurality of functional 
units, and including the step of storing auxiliary in- 
formation in the cache memory storage locations, 
the auxiliary information being indicative of whether 
the plurality of instructions stored in the slots of a 
cache memory storage location are independent 
such that the instructions may be executed in par- 
allel, or dependent such that the instructions must 
be executed sequentially. 



Patentanspruche 

1. Prozessor, der Befehle ausfuhrt, mit: 

einer Prozessoreinheit (14), die mehrere funk- 
tionale Einheiten (24, 26, 28) zum parallelen 
Ausfuhren von Befehlen enthalt, 
einer ersten Wiedergewinnungseinrichtung 
(18) zum Wiedergewinnen eines codierten Be- 
fehls von einem externen Hauptspeicher; 
einer Decodierungseinrichtung (18, 20) zum 
Decodieren codierter Befehle; 
einer internen Cache-Speichereinrichtung 
(22), die mehrere Cache-Speicherplatze zum 
Speichern von Befehlen enthalt; und 
einer zweiten Wiedergewinnungseinrichtung 
zum gleichzeitigen Wiedergewinnen mehrerer 
Befehle von ausgewahlten Cache-Speicher- 
platzen fur deren Ausfuhrung durch die funktio- 
nalen Einheiten (24, 26, 28), 



wobei 
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die interne Cache-Speichereinrichtung (22) so 
beschaffen ist, daft sie teilweise decodierte Be- 
fehle speichert, 

die codierten Befehle, die aus dem Hauptspei- 
cher wiedergewonnen werden, durch die De- 
codierungseinrichtung (18, 20) teilweise deco- 
diert werden, bevor sie in der internen Cache- 
Speichereinrichtung (22) gespeichert werden, 
und 

die teilweise decodierten Befehle, die von der 
internen Cache-Speichereinrichtung (22) wie- 
dergewonnen werden, von der zweiten Wieder- 
gewinnungseinrichtung direkt zu den funktio- 
nalen Einheiten (24, 26, 28) geschickt werden. 
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Prozessor nach Anspruch 1 , wobei jeder der Ca- 
che-Speicherplatze mehrere Speicherschlitze ent- 
halt, wovon jeder eine Einrichtung zum Speichern 
eines teilweise decodierten Befehls enthalt. 7. 

5 

Prozessor nach Anspruch 2, wobei die zweite Wie- 
dergewinnungseinrichtung gleichzeitig mehrere 
teilweise decodierte Befehle von den Speicher- 
schlitzen eines ausgewahlten Cache-Speicherplat- 8. 
zes wiedergewinnen kann, damit sie von den men- 10 
reren funktionalen Einheiten (24, 26, 28) parallel 
ausgefuhrt werden konnen. 

Prozessor nach einem der Anspruche 1 bis 3, wobei 
jeder der Cache-Speicherplatze eine Einrichtung 15 
zum Speichern von Hilfsinformationen enthalt, die 
angeben, ob die mehreren in den Schlitzen eines 
Cache-Speicherplatzes gespeicherten Befehle un- 
abhangig sind, so daB die Befehle parallel ausge- 
fuhrt werden konnen, oder abhangig sind, so daB 20 
die Befehle sequentiell ausgefuhrt werden mussen. 

Prozessor nach einem der Anspruche 1 bis 4, wobei 
derexterne Hauptspeicheran den Prozessor durch 
einen Systembus angeschlossen ist, der an eine 25 
Busschnittstelleneinheit (36) zum Wiedergewinnen 
codierter Kernbefehle und codierter Nichtkern-Be- 
fehle vom externen Hauptspeicher angeschlossen 
ist. 

30 

Prozessor nach einem der Anspruche 1 bis 5, wobei 
die Decodierungseinrichtung (18, 20) eine Befehls- 
ladeeinrichtung (18) zum Ubersetzen eines ersten 
codierten Kernbefehls in einen ersten teilweise de- 
codierten Befehl und eines zweiten codierten Kern- 35 
befehls in einen zweiten teilweise decodierten Be- 
fehl enthalt und ferner eine Einrichtung enthalt, die 
auf einen empfangenen Nichtkern-Befehl an- 
spricht, wobei 

40 

die interne Cache-Speichereinrichtung (22) die 
mehreren Cache-Speicherplatze umfaBt, wo- 
bei jeder Cache-Speicherplatz mehrere Spei- 
cherschlitze enthalt, wovon jeder eine Einrich- 
tung zum Speichern eines decodierten Befehls 4$ 
enthalt; und 

eine Einrichtung zum gleichzeitigen Wiederge- 
winnen mehrerer decodierter Befehle aus den 
Speicherschlitzen eines ausgewahlten Cache- 9. 
Speicherplatzes zum parallelen oder sequenti- so 
ellen Ausf Ohren durch die mehreren funktiona- 
len Einheiten, und wobei 
jeder der Cache-Speicherplatze eine Einrich- 
tung zum Speichern von Hilfsinformationen 
enthalt, die angeben, ob die mehreren Befehle, 55 
die in den Schlitzen eines Cache-Speicherplat- 
zes gespeichert sind, unabhangig sind, so daB 
die Befehle parallel ausgefuhrt werden konnen, 1 0. 



oder abhangig sind, so daB die Befehle se- 
quentiell ausgefuhrt werden mussen. 

Prozessor nach einem der Anspruche 1 bis 6, wobei 
die interne Cache-Speichereinrichtung (22) eine 
mengenassoziative Zweiwege -Organisation ent- 
halt. 

Verfahren zum Ausfuhren von Befehlen in einem 
Prozessor, wobei der Prozessor enthalt: 

eine Prozessoreinheit (14), die mehrere funk- 
tionale Einheiten (24, 26, 28) zum parallelen 
Ausfuhren von Befehlen enthalt; 
eine erste Wiedergewinnungseinrichtung (18) 
zum Wiedergewinnen eines codierten Befehls 
vom externen Hauptspeicher; 
eine Decodierungseinrichtung (18, 20) zum 
Decodieren codierter Befehle; 
eine interne Cache-Speichereinrichtung (22), 
die mehrere Cache-Speicherplatze zum Spei- 
chern von Befehlen enthalt; und 
eine zweite Wiedergewinnungseinrichtung 
zum gleichzeitigen Wiedergewinnen mehrerer 
Befehle von ausgewahlten Cache-Speicher- 
platzen fur die Ausfuhrung durch die funktiona- 
len Einheiten (24, 26, 28), 

wobei das Verfahren diefolgenden Schritte enthalt: 

(a) Wiedergewinnen codierter Befehle vom ex- 
ternen Hauptspeicher; 

(b) teilweises Decodieren der im Wiedergewin- 
nungsschritt (a) wiedergewonnen Befehle; 

(c) Speichern der im Decodierungsschritt (b) 
teilweise decodierten Befehle in der internen 
Cache-Speichereinrichtung (22); und 

(d) Wiedergewinnen der im Speicherschritt (c) 
gespeicherten teilweise decodierten Befehle 
fur eine nachfolgende Ausfuhrung durch die 
mehreren funktionalen Einheiten, 

wobei die Schritte (a) bis (c) vor der Ausfuhrung des 
Schrittes (d) ausgefuhrt werden, wobei teilweise 
decodierte Befehle direkt zu den funktionalen Ein- 
heiten (24, 26, 28) geschickt werden, wodurch die 
Verarbeitungsgeschwindigkeit erhoht wird. 

Verfahren nach Anspruch 8, mit dem Schritt des 
Speicherns von Hilfsinformationen in den Cache- 
Speicherplatzen, wobei die Hilfsinformationen an- 
geben, ob die mehreren in den Schlitzen eines Ca- 
che-Speicherplatzes gespeicherten Befehle unab- 
hangig sind, so daB die Befehle parallel ausgefuhrt 
werden konnen, oder abhangig sind, so daB die Be- 
fehle sequentiell ausgefuhrt werden mussen. 

Verfahren nach Anspruch 8 oder 9, bei dem die in- 
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terne Cache-Speichereinrichtungdie mehreren Ca- 
che-Speicherplatze enthalt, wobei jeder Cache- 
Speicherplatz mehrere Speicherschlitze enthalt, 
wovon jeder eine Einrichtung zum Speichern eines 
decodierten Befehls enthalt; und 

mit dem Schritt des gleichzeitigen Wiederge- 
winnens mehrerer decodierter Befehle aus den 
Speicherschlitzen eines ausgewahlten Cache- 
Speicherplatzes fur eine parallele Oder sequen- 
tielle Ausfuhrung durch die mehreren funktio- 
nalen Einheiten, und mit dem Schritt des Spei- 
cherns von Hilfsinformationen in den Cache- 
Speicherplatzen, wobei die Hilfsinformationen 
angeben, ob die mehreren in den Schlitzen ei- 
nes Cache-Speicherplatzes gespeicherten Be- 
fehle unabhangig sind, so daB die Befehle par- 
allel ausgef uhrt werden konnen, oder abhangig 
sind, so daG die Befehle sequentiell ausgef uhrt 
werden mussen. 



Revendications 

1. Processeur qui execute des instructions, 
comportant : 

une unitd de processeur (14) pourvue d'une 
pluralite d'unites fonctionnelles (24, 26, 28) 
pour l'ex6cution d'instructions en parallele, 
des premiers moyens d'extraction (18) pour ex- 
traire une instruction codee d'une memoire 
principale externe; 

des moyens de decodage (18, 20) pour deco- 
der des instructions cod6es : 
des moyens de memorisation a antememoire 
interne (22) comportant une pluralite d'empla- 
cements de memorisation en ant6m6moire 
pour memoriser des instructions; et 
des seconds moyens d'extraction pour extraire 
simultan6ment une plurality d'instructions 
d'emplacements s6lectionn6s de memorisation 
en ant6m6moire pour leur execution par lesdits 
unites fonctionnelles (24, 26, 28), dans lequel 
les moyens de memorisation a ant6m6moire in- 
terne (22) sont concus pour m6moriser des ins- 
tructions partiellement decod6es, 
les instructions codees extraites de la memoire 
principale sont partiellement d6cod6es par les- 
dits moyens de decodage (18, 20) avant d'etre 
memoris6es dans lesdits moyens de memori- 
sation a antem6moire interne (22), et 
lesdites instructions partiellement d6cod6es 
extraites desdits moyens de memorisation a 
antememoire interne (22) sont envoy6es direc- 
tement aux unites fonctionnelles (24, 26, 28) 
par lesdits seconds moyens d'extraction. 



2. Processeur selon la revendication 1, dans lequel 
chacun desdits emplacements de memorisation en 
antememoire comprend une pluralite de cr6neaux 
de memorisation, chacun desdits cr6neaux de m6- 

s morisation comportant des moyens pour memoriser 
une instruction partiellement d6codee. 

3. Processeur selon la revendication 2, dans lequel 
lesdits seconds moyens d'extraction sont capables 

10 d'extraire simultan6ment une pluralite d'instructions 
partiellement decod6es desdits cr6neaux de me- 
morisation d'un emplacement s6lectionn6 de me- 
morisation en antememoire pour leur execution en 
parallele par ladite pluralite d'unites fonctionnelles 

1S (24, 26, 28). 

4. Processeur selon I'une des revendications 1 a 3, 
dans lequel chacun desdits emplacements de me- 
morisation en ant6m6moire comporte des moyens 

20 pour memoriser des informations auxiliaires indi- 
quant si les plusieurs instructions memoris6es dans 
lesdits cr6neaux d'un emplacement de memorisa- 
tion en antememoire sont independantes, de sorte 
que les instructions peuvent etre ex6cutees en pa- 

25 rallele, ou dependantes de sorte que les instruc- 
tions doivent etre ex6cut6es en sequence. 

5. Processeur selon Tune des revendications 1 a 4, 
dans lequel ladite m6moire principale externe est 

30 connect6e au processeur au moyen d'un bus de 
systeme, ledit bus de systeme etant connecte a une 
unite d'interface de bus (36) pour extraire des ins- 
tructions cod6es critiques et des instructions co- 
dees non critiques de ladite memoire principale ex- 

35 terne. 

6. Processeur selon I'une des revendications 1 a 5, 
dans lequel les moyens de decodage (18, 20) com- 
ponent un chargeur d'instructions (18) pourtraduire 

40 une premiere instruction cod6e critique en une pre- 
miere instruction partiellement decodee et une se- 
conde instruction cod6e critique en une seconde 
instruction partiellement d6cod6e, comportant en 
outre des moyens r6pondant a la reception d'une 

45 instruction non critique, dans lequel 

lesdits moyens de memorisation a antememoi- 
re interne (22) comportent ladite pluralite d'em- 
placements de memorisation en antememoire, 

50 chaque emplacement de memorisation en an- 

tememoire comprenant une pluralite de cr6- 
neaux de memorisation, chacun des cr6neaux 
de memorisation comportant des moyens pour 
memoriser une instruction decodee; et 

55 des moyens pour extraire simultan6ment une 

pluralite d'instructions d6cod6es des cr6neaux 
de memorisation d'un emplacement s6lection- 
n6 de memorisation en ant6m6moire pour leur 
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execution en parallels ou en sequence par la 
plurality d'unites fonctionnelles, et dans lequel 
chacun des emplacements de memorisation en 
ant6m6moire comporte des moyens pour me- 
moriser des informations auxiliaires indiquant 
si les plusieurs instructions memorisees dans 
les creneaux d'un emplacement de m6morisa- 
tion en antememoire sont independantes, de 
sorte que les instructions peuvent etre execu- 
tees en parallele, ou d6pendantes de sorte que 
les instructions doivent §tre executees en se- 
quence. 

7. Processeur selon Tune des revendications 1 a 6, 
dans lequel les moyens de memorisation a ant6m6- 
moire interne (22) component une organisation as- 
sociative bidirectionnelle. 

8. Proc6d6 d'ex6cution d'instructions dans un proces- 
seur, ledit processeur comportant : 

une unite de processeur (14) pourvue d'une 
plurality d'unites fonctionnelles (24, 26, 28) 
pour I'execution d'instructions en parallele; 
des premiers moyens d'extraction (18) pour ex- 
traire une instruction codee de la memoire prin- 
cipale externe; 

des moyens de d6codage (18, 20) pour deco- 
der des instructions codees : 
des moyens de memorisation a ant6m6moire 
interne (22) comportant une plurality d'empla- 
cements de memorisation en antememoire 
pour m6moriser des instructions; et 
des seconds moyens d'extraction pour extraire 
simultan6ment une pluralite d'instructions 
d'emplacements s6lectionnes de memorisation 
en antememoire pour leur execution par lesdits 
unit6s fonctionnelles (24, 26, 28), 



26, 28), ameliorant ainsi la Vitesse de traitement. 

9. Proc6d6 selon la revendication 8, comportant I'eta- 
pe de m6moriser des informations auxiliaires dans 

s les emplacements de memorisation en antem6moi- 
re, les informations auxiliaires indiquant si les plu- 
sieurs instructions memorisees dans les creneaux 
d'un emplacement de memorisation en ant6m6moi- 
re sont independantes, de sorte que les instructions 

10 peuvent etre ex6cut6es en parallele, ou d6pendan- 
tes de sorte que les instructions doivent etre ex6- 
cut6es en sequence. 

10. Precede selon la revendication 8 ou 9, dans lequel 
is lesdits moyens de memorisation a ant6m6moire in- 
terne (22) component ladite pluralite d'emplace- 
ments de memorisation en antememoire, chaque 
emplacement de memorisation en ant6m6moire 
comprenant une pluralite de cr6neaux de memori- 
ze sation, chacun des creneaux de memorisation com- 
portant des moyens pour m6moriser une instruction 
d6cod6e; et 

on extraitsimultan6ment une pluralite d'instruc- 
25 tions decod6es des cr6neaux de memorisation 

d'un emplacement seiectionne de memorisa- 
tion en ant6m6moire pour leur execution en pa- 
rallele ou en sequence par la pluralite d'unites 
fonctionnelles, et comportant l'6tape de m6mo- 
30 risation d'informations auxiliaires dans les em- 

placements de memorisation en antememoire, 
les informations auxiliaires indiquant si les plu- 
sieurs instructions m6moris6es dans les cre- 
neaux d'un emplacement de memorisation en 
35 ant6m6moire sont independantes, de sorte que 

les instructions peuvent etre executees en pa- 
rallele, ou dependantes de sorte que les ins- 
tructions doivent etre executees en sequence. 



ledit proc6d6 comportant les 6tapes suivantes : 40 



(a) extraire des instructions cod6es de ladite 
m6moire principale externe; 

(b) decoder partiellement les instructions ex- 
traites dans retape d'extraction (a); 

(c) m6moriser les instructions partiellement d6- 
cod6es dans retape de d6codage (b) dans les- 
dits moyens de memorisation a antememoire 
interne (22); et 

(d) extraire les instructions partiellement d6co- 
d6es memorisees dans retape de memorisa- 
tion (c) pour leur execution subs6quente par la- 
dite pluralite d'unites fonctionnelles, 



45 



so 



dans lequel les etapes (a) a (c) sont effec- 
tu6es avant que retape (d) soit eff ectuee, de sorte 
que des instructions partiellement decodees sont 
d6livr6es directement aux unites fonctionnelles (24, 



55 
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